A Comparison of Inference Techniques for Semi-supervised Clustering with Hidden Markov Random Fields

نویسندگان

  • Mikhail Bilenko
  • Sugato Basu
چکیده

Recently, a number of methods have been proposed for semi-supervised clustering that employ supervision in the form of pairwise constraints. We describe a probabilistic model for semisupervised clustering based on Hidden Markov Random Fields (HMRFs) that incorporates relational supervision. The model leads to an EMstyle clustering algorithm, the E-step of which requires collective assignment of instances to cluster centroids under the constraints. We evaluate three known techniques for such collective assignment: belief propagation, linear programming relaxation, and iterated conditional modes (ICM). The first two methods attempt to globally approximate the optimal assignment, while ICM is a greedy method. Experimental results indicate that global methods outperform the greedy approach when relational supervision is limited, while their benefits diminish as more pairwise constraints are provided.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combinatorial Markov Random Fields

A combinatorial random variable is a discrete random variable defined over a combinatorial set (e.g., a power set of a given set). In this paper we introduce combinatorial Markov random fields (Comrafs), which are Markov random fields where some of the nodes are combinatorial random variables. We argue that Comrafs are powerful models for unsupervised and semi-supervised learning. We put Comraf...

متن کامل

INRIA Research Project Proposal mistis Modelling and Inference of Complex and Structured Stochastic Systems

5 Domains of research 10 5.1 Mixture models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.1.1 Learning and classification techniques . . . . . . . . . . . . . . . . . . 11 5.1.2 Taking into account the curse of dimensionality. . . . . . . . . . . . 12 5.2 Markov models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2.1 Triplet Markov Fields f...

متن کامل

Clustering Heterogeneous Data with Mutual Semi-supervision

We propose a new methodology for clustering data comprising multiple domains or parts, in such a way that the separate domains mutually supervise each other within a semi-supervised learning framework. Unlike existing uses of semi-supervised learning, our methodology does not assume the presence of labels from part of the data, but rather, each of the different domains of the data separately un...

متن کامل

Probabilistic Semi-Supervised Clustering with Constraints

Unsupervised clustering can be significantly improved using supervision in the form of pairwise constraints, i.e., pairs of instances labeled as belonging to same or different clusters. In recent years, a number of algorithms have been proposed for enhancing clustering quality by employing such supervision. Such methods use the constraints to either modify the objective function, or to learn th...

متن کامل

Hidden Markov Random Fields Based LSI Text Semi-supervised Clustering

Semi-supervised learning is an active research field. Previous results shown that unite background information into the original unsupervised clustering problem could archive higher accuracy. In this paper, we explore the cooperation between the pairwise constrains given by the user and the sematic information in natural language. In addition, we reduce the time complexity to make the algorithm...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004